Using Concept-Based Indexing to Improve Language Modeling Approach to Genomic IR

نویسندگان

  • Xiaohua Zhou
  • Xiaodan Zhang
  • Xiaohua Hu
چکیده

Genomic IR, characterized by its highly specific information need, severe synonym and polysemy problem, long term name and rapid growing literature size, is challenging IR community. In this paper, we are focused on addressing the synonym and polysemy issue within the language model framework. Unlike the ways translation model and traditional query expansion techniques approach this issue, we incorporate concept-based indexing into a basic language model for genomic IR. In particular, we adopt UMLS concepts as indexing and searching terms. A UMLS concept stands for a unique meaning in the biomedicine domain; a set of synonymous terms will share same concept ID. Therefore, the new approach makes the document ranking effective while maintaining the simplicity of language models. A comparative experiment on the TREC 2004 Genomics Track data shows significant improvements are obtained by incorporating concept-based indexing into a basic language model. The MAP (mean average precision) is significantly raised from 29.17% (the baseline system) to 36.94%. The performance of the new approach is also significantly superior to the mean (21.72%) of official runs participated in TREC 2004 Genomics Track and is comparable to the performance of the best run (40.75%). Most official runs including the best run extensively use various query expansion and pseudo-relevance feedback techniques while our approach does nothing except for the incorporation of concept-based indexing, which evidences the view that semantic smoothing, i.e. the incorporation of synonym and sense information into the language models, is a more standard approach to achieving the effects traditional query expansion and pseudo-relevance feedback techniques target.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-lingual Indexing Support for CLIR using Language Modeling

An indexing model is the heart of an Information Retrieval (IR) system. Data structures such as term based inverted indices have proved to be very effective for IR using vector space retrieval models. However, when functional aspects of such models were tested, it was soon felt that better relevance models were required to more accurately compute the relevance of a document towards a query. It ...

متن کامل

Factors affecting the effectiveness of biomedical document indexing and retrieval based on terminologies

OBJECTIVE The aim of this work is to evaluate a set of indexing and retrieval strategies based on the integration of several biomedical terminologies on the available TREC Genomics collections for an ad hoc information retrieval (IR) task. MATERIALS AND METHODS We propose a multi-terminology based concept extraction approach to selecting best concepts from free text by means of voting techniq...

متن کامل

STD based on Hough Transform and SDR using STD results: Experiments at NTCIR-9 SpokenDoc

In this paper, we report our experiments at NTCIR-9 IR for Spoken Documents (SpokenDoc) task. We participated both the STD and SDR subtasks of SpokenDoc. For STD subtask, we applied novel indexing method, called metric subspace indexing, previously proposed by us. One of the distinctive advantages of the method was that it could output the detection results in increasing order of distance witho...

متن کامل

Development and Validation of Teacher Emotional Support Scale: a structural equation modeling approach

Reviewing the literature indicated that no validated model was found that examine the extent to which teachers support their students emotionally in EFL classrooms. Therefore the present study elaborated on this issue through developing and validating a teacher emotional support scale in an Iranian English foreign language context. Main components of the scale have been specified based on Hamre...

متن کامل

Using Term Sense to Improve Language Modeling Approach to Genomic IR

Genomic IR, characterized by its highly specific information need, severe synonym and polysemy problem, long term name and rapid growing literature size, is challenging IR community. In this paper, we are focused on addressing the synonym and polysemy issue under the language modeling framework. Unlike the ways translation model and traditional query expansion techniques approach to this issue,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006